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PROVIDE MULTIPLE 
SEQUENCE ALIGNMENT (e.g., 
a&gnment tof tarrtfy of rotated 
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PROVIDE 
SEQUENCE DATABASE 
(e-g.. one or more fists of 
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GENERATE PHYSICAL CHEMICAL 
PROPERTY (PCP) MOTIFS 
(Lo-» PCP signatures) 




MINE DATA USING PCP MOTIFS TO 

LOCATE RELATED SEQUENCE 
SEGMENTS (o.g^ use window scoring 
funcSon to detect related sequence 
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DETERMINE WHICH PROTEINS OF 

THE SEQUENCE DATABASE 
MOST RESEMBLE THE PHYSICAL- 
CHEMICAL PROPERTIES OF THE 
ORIGINAL FAMILY OF SEQUENCES 
USED TO GENERATE THE MULTIPLE 
SEQUENCE ALIGNMENT 
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COLLECT DIVERSE RELATED 
SEQUENCES (e.g.. known protein family 
desired to be used for query of sequence 
database) 






GENERATE MULTIPLE SEQUENCE 
ALIGNMENT 







PROVIDE THE MULTIPLE SEQUENCE 
ALIGNMENT (e.g., an alignment 
including a plurality of columns, each 
vertical column corresponding to a 
position horizontally in the multiple 
sequence alignment) 
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PROVIDE MULTIDIMENSIONAL NUMERICAL PCP 
DESCRIPTORS FOR EACH AMINO ACtO BASED ON THE 
PHYSICAL CHEMICAL PROPERTIES OF EACH AMINO ACID 
numerical PCP descriptors for each amrio acid 
fcxAKlno a numerical jtesc*** tor eachofe pturaftyof 



DESCRIBE THE MULTIPLE SEQUENCE ALIGNMENT (e.o~ ™ 
* *nino add thereof QUANTITATIVELY IN TERMS OF THEPCP 
DESCRIPTORS AS A SERIES OF (n) EIGENVECTORS 
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VERTICALLY ANALYZE THE QUANTITATIVELY DESCRIBED 
MULTIPLE SEQUENCE ALIGNMENT. ON A COLUMN BY COLUMN 
BASC5, AS A FUNCTION OF THE OONSERV ATlON THEPCP 

DESCRIPTORS IN EACH COLUMN IN THE SERIES OF 
EIGENVECTORS. DETERMINE THE CONSERVATION IN EACH 
COLUMN BASED ON STANDARD DEVIATION 
AND RELATIVE ENTROPY 
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DEFINING PCP MOTIFS AS CONSECUTIVE POSITIONS IN THE 
f^t^t^^^ ALIGNMENT WHERE SIGNIFICANT 
CONSERVATION EXISTS (e.g. where the relative entropy exceeds a 
prectetemwted Brrwt) 
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GENERATE A SERIES OF PCP MOTIF PROFILE MATRICES FOR 
EACH PCP MOTIF 
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DESCRIPTORS FOR EACH AMINO ACID BASED ON THE 
PHYSICAL CHEMICAL PROPERTIES OF EACH AMINO AOD 
(e.g_, the numerical PCP descriptors tor each arr*x> acid 
rnctudrng a numerical O^scriptor tor each of a pfurafty of 
Bgenvectors) 



™ KttiLT1P1£ SEQUENCE ALIGNMENT 

QUANTITATIVELY IN TERMS OF THE PCP DESCRIPTORS 
^-s^^VHif? ^ DISTINCT PCP DESCRIBED ALIGNMENTS 
CORRESPONDING TO THE N EIGENVECTORS. EACH DISTINCT 
ALIGNMENT INCLUDING COLUMNS CC^Sf^INGTOTHE 
a. SS^t*^ 0 1** 3 ***) OF THE MULTIPLE SEQUENCE 
^^^^jS? EACH °«-«MN INCLUDES NUMERICAL 
PCP DESCRIPTORS (Let, corresponding to the eigenvector bang 
defined) FOR EACH AMINO AGP (Lol, residue) INTHEOOLUMN 



ANALYZE EACH OF THE N PCP DESCRIBED ALIGNMENTS 
COLUMNBY COLUMN* TO GENERATE CONSERVATION 
PROPERTY DATA FOR EACH COLUMN THEREOF (e.g_ the 

* related standard deviation for the column, and a relate entropy 
for the column) 
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SCAN>^ (ej^ rK)ri2orit^ 
ana^ed) THE CONSERVATION PROPERTY DATA (e.g.. the reS^e 

^ entropy values generated for each column of ihe PCP describGd 
alignment being analyzed) TO DETECT CONSECUTIVE POSITIONS 

^^^^^^^^^^ 
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BASED ON THE DETECTED CONi 
SPECIFIED GAP AND MINIMUM 
WHERE THE PCP MOTIFS ARI 
SEQUENCE 


JECUTIVE POSITIONS AND USER 
LENGTH LIMITS, DETERMINING 
= LOCATED IN THE MULTIPLE 
ALIGNMENT 





GENERATE A SERIES OF PCP MOTIF PROFILE MATRtrrc on a 

9AULT !^ HSiONAL "a™* ^^^^^^ 

defined by a plurality of matrices, wfth each matrix thereof 
conesponomg to one of n eigenvectors, and further wherein each 
oJ^^S^^^S^L^ inC * udGS avefa 9 e ***** <or each 
£^£^£f 2l P m2* ** mo a^f^P^ng eigenvector 
^^oSS Z?£5?2 ^ ^ gives the relative entrn^ 
each posrbon of the PCP motif for the corresponding eigenvector) 
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PROVIDE THE PCP MOTIF 
PROFILE MATRICES USED TO 
DEFINE EACH PCP MOTIF 



PROVIDE SEQUENCE 
DATABASE TO BE 
SEARCHED (e.g., a ptuntfty 
of proteins) 
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CONVERT SEQUENCE DATABASE INTO 

™™ CORRESP^N^TO 
PCP MOTIF PROFILE MATRICES USING 

matnces. where n Is the number of Eigenvector 
and m is the length of the sequence) 
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MATCHING VALUES OF THE PCP MOTIF PROFILE 

CONVERTED SEQUENCE DATABASEUSOTG A 
POSITIONAL SCORING FUNCTION^ 
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^'^T^fT—^L 300 ^ 0 ENDOWS AS BEING 
BEST MATCHES FOR THE PCP MOTIF PRDFD F 
MATRICES USED TO DERNEEACHPCPMOTIF 
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SFSSSSfc'iSf J^^^W VALUES AND/OR 
SEQUENCES OF THE SEGMENTS OF SEARCHFn 
^O^ES THAT ARE BEST MATC^ED^TH^P^P 
MOTTF PRORLE MATRICES USED TO DEFINE EACHPCP 

i a J°^J am «y >««» to Banerate the mulfipte sequence^ 
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PROVIDE FILE INCLUDING VALUES ANQfOR 
SEQUENCES OF THE SEGMENTS OF SEARCHED 
SEQUENCES THAT ARE BEST MATCHED TO THE PCP 
MOTIF PROFILE MATRICES USED TO DEFINE THE PCP 
MOTIFS 



LINK DATA FOR EACH FOUND SEGMENT FOR EACH 
PROTEIN IN THE SEQUENCE DATABASE 



DETERMINE AN OVERALL SIMILARITY DISTANCE SCOfC 
FOR EACH PROTEIN OF THE SEQUENCE DATABASE 
BEING SEARCHED BASED ON THE SCORE FOR EACH 
FOUND SEGMENT AND RELATIVE TO WHAT A RANDOM 
SCORE FOR THE FOUND SEGMENT WOULD BE 



RANK THE PROTEINS OF THE SEQUENCE DATABASE 
BEING SEARCHED ACCORDING TO OVERALL PCP 
SIMILARITY DISTANCE SCORE 



PROVIDE OUTPUT OF PROTEINS WITH THE HIGHEST 
SCORE 
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PROVIDE STRUCTURE FOR A PLURALITY OF THE 
SEGMENTS THAT ARE BEST MATCHED TO THE PCP 
MOTIF PROFILE MATRICES USED TO DEFINE THE PCP 
MOTIFS 




± 



CALCULATE THE SEGMENTAL RMSD BETWEEN THE 
QUERY STRUCTURES AND THE STRUCTURES OF THE 
BEST MATCHED SEGMENTS 
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PROVIDE QUERY STRUCTURES 
RELATED TO THE SEGMENTS OF 
THE MULTIPLE SEQUENCE 
ALIGNMENT (e.g.. one or more 
segments of the sequences aligned, 
the consensus sequence of the 
alignment, etc.) 
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RANK THE PROTEINS OF THE SEARCHED SEQUENCE 
DATABASE ACCORDING TO THE STRUCTURAL 
SIMILARITY 



RANK THE PROTEINS OF SEARCHED SEQUENCE 
DATABASE ACCORDING TO THE STRUCTURAL 
SIMILARITY RANKING AND PCP SIMILARITY DISTANCE 
SCORING 
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PROVIDE SCORING (e.g., ranking) 
OF PROTEINS OF SEARCHED 
SEQUENCE DATABASE 
ACCORDING TO PCP SIMILARITY 
DISTANCE SCORING 
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Score range (motif #1) 
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Score range (motif #7) 



Score range (motif #8) 




Score range (motif #9) 



Score range (motif #10) 




Score range (moti^l 1) 
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